14 research outputs found
Incremental Dead State Detection in Logarithmic Time
Identifying live and dead states in an abstract transition system is a
recurring problem in formal verification; for example, it arises in our recent
work on efficiently deciding regex constraints in SMT. However,
state-of-the-art graph algorithms for maintaining reachability information
incrementally (that is, as states are visited and before the entire state space
is explored) assume that new edges can be added from any state at any time,
whereas in many applications, outgoing edges are added from each state as it is
explored. To formalize the latter situation, we propose guided incremental
digraphs (GIDs), incremental graphs which support labeling closed states
(states which will not receive further outgoing edges). Our main result is that
dead state detection in GIDs is solvable in amortized time per edge
for edges, improving upon per edge due to Bender, Fineman,
Gilbert, and Tarjan (BFGT) for general incremental directed graphs.
We introduce two algorithms for GIDs: one establishing the logarithmic time
bound, and a second algorithm to explore a lazy heuristics-based approach. To
enable an apples-to-apples experimental comparison, we implemented both
algorithms, two simpler baselines, and the state-of-the-art BFGT baseline using
a common directed graph interface in Rust. Our evaluation shows -x
speedups over BFGT for the largest input graphs over a range of graph classes,
random graphs, and graphs arising from regex benchmarks.Comment: 22 pages + reference
Automata-Based Stream Processing
We propose an automata-theoretic framework for modularly expressing computations on streams of data. With weighted automata as a starting point, we identify three key features that are useful for an automaton model for stream processing: expressing the regular decomposition of streams whose data items are elements of a complex type (e.g., tuple of values), allowing the hierarchical nesting of several different kinds of aggregations, and specifying modularly the parallel execution and combination of various subcomputations. The combination of these features leads to subtle efficiency considerations that concern the interaction between nondeterminism, hierarchical nesting, and parallelism. We identify a syntactic restriction where the nondeterminism is unambiguous and parallel subcomputations synchronize their outputs. For automata satisfying these restrictions, we show that there is a space- and time-efficient streaming evaluation algorithm. We also prove that when these restrictions are relaxed, the evaluation problem becomes inherently computationally expensive
Safe Programming Over Distributed Streams
The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs.
This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries
Safe Programming over Distributed Streams
The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs. This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries
Context Directed Reversals on Permutations and Graphs
Efficient Information Processing is fundamental to activities stretching from genome maintenance to data management. This project is analyzing the nature of and unusual efficiency in sorting information, of an elaborate genome maintenance system. Single cell organisms called ciliates host an encrypted copy of their genome in a micronucleus. Their genome maintenance system often replaces the current functional genome by decrypting an encrypted copy.
Decryption is performed through permutation sorting, using context directed reversals (cdr) and context directed block swaps (cds). The decryption mechanism has computational power and is programmable, giving compelling reasons to examine its mathematical properties. Generalizing several prior results, we identify the set of all signed permutations that are sortable by applications of cdr and cds. The methods used in this investigation are from the mathematical fields of algebra, combinatorics, graph theory and low dimensional topology
Diastereoselective Hydrolysis of Branched Malonate Diesters by Porcine Liver Esterase: Synthesis of 5-Benzyl-Substituted C\u3csup\u3eα\u3c/sup\u3e-Methyl-ÎČ-proline and Catalytic Evaluation
Malonate diesters with highly branched side chains containing a preexisting chiral center were prepared from optically pure amino alcohols and subjected to asymmetric enzymatic hydrolysis by Porcine Liver Esterase (PLE). Recombinant PLE isoenzymes have been utilized in this work to synthesize diastereomerically enriched malonate halfâesters from enantiopure malonate diesters. The diastereomeric excess of the product halfâesters was further improved in the later steps of synthesis either by simple recrystallization or flash column chromatography. The diastereomerically enriched halfâester was transformed into a novel 5âsubstituted CαâmethylâÎČâproline analogue (3R,5S)â1c, in high optical purity employing a stereoselective cyclization methodology. This ÎČâproline analogue was tested for activity as a catalyst of the Mannich reaction. The ÎČâproline analogue derived from the hydrolysis reaction by the crude PLE appeared to catalyze the Mannich reaction between an αâimino ester and an aldehyde providing decent to good diastereoselectivities. However, the enantioselectivities in the reaction was low. The second diastereomer of the 5âbenzylâsubstituted CαâmethylâÎČâproline, (3S,5S)â1c was prepared by enzymatic hydrolysis using PLE isoenzyme 3 and tested for its catalytic activity in the Mannich reaction. Amino acid, (3S,5S)â1c catalyzed the Mannich reaction between isovaleraldehyde and an αâimino ester yielding the âantiâ selective product with an optical purity of 99â%ee